Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

datastreams: Port data-streams-go to dd-trace-go #2006

Merged
merged 18 commits into from
Sep 6, 2023

Conversation

piochelepiotr
Copy link
Collaborator

@piochelepiotr piochelepiotr commented May 26, 2023

What does this PR do?

Move the repository https://github.com/DataDog/data-streams-go to the main APM Datadog Go repository.
This way, the two can be better integrated together and users only need to install 1 library.

Motivation

Describe how to test/QA your changes

Reviewer's Checklist

  • Changed code has unit tests for its functionality.
  • If this interacts with the agent in a new way, a system test has been added.

@pr-commenter
Copy link

pr-commenter bot commented May 26, 2023

Benchmarks

Benchmark execution time: 2023-09-06 13:45:30

Comparing candidate commit 6341dc6 in PR branch piotr-wolski/add-data-streams-support with baseline commit b49b27c in branch main.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 40 metrics, 1 unstable metrics.

@piochelepiotr piochelepiotr force-pushed the piotr-wolski/add-data-streams-support branch 3 times, most recently from c1fac6c to 0448e4d Compare May 26, 2023 14:22
@piochelepiotr piochelepiotr marked this pull request as ready for review May 26, 2023 14:40
@piochelepiotr piochelepiotr requested a review from a team May 26, 2023 14:40
@piochelepiotr piochelepiotr requested a review from a team as a code owner May 26, 2023 14:40
Copy link
Contributor

@ahmed-mez ahmed-mez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, some core functionalities (like container ID parsing and config) need to be refactored first.

@ahmed-mez
Copy link
Contributor

Hey @piochelepiotr ! Any update on this PR? It's been 2 months since the review, we'll probably consider it stale and close it if there is no progress in the next few weeks. Thanks!

@piochelepiotr piochelepiotr force-pushed the piotr-wolski/add-data-streams-support branch from 0448e4d to 3d8ee19 Compare August 2, 2023 19:35
datastreams/aggregator.go Outdated Show resolved Hide resolved
datastreams/aggregator.go Outdated Show resolved Hide resolved
datastreams/aggregator_test.go Outdated Show resolved Hide resolved
ddtrace/ddtrace.go Outdated Show resolved Hide resolved
ddtrace/internal/globaltracer_test.go Outdated Show resolved Hide resolved
ddtrace/internal/noop_datastreams.go Outdated Show resolved Hide resolved
@piochelepiotr piochelepiotr force-pushed the piotr-wolski/add-data-streams-support branch 2 times, most recently from 7735514 to 386f073 Compare August 3, 2023 04:20
ddtrace/internal/globaltracer.go Outdated Show resolved Hide resolved
datastreams/transport.go Outdated Show resolved Hide resolved
@piochelepiotr piochelepiotr force-pushed the piotr-wolski/add-data-streams-support branch 2 times, most recently from d9f9cfc to ca184db Compare August 3, 2023 14:18
Copy link
Contributor

@knusbaum knusbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👋
I left an initial review, almost exclusively on how it integrates with the existing tracer.
I'll still have to look more closely at the data streams package itself.

I would like to understand better what this is, why dd-trace-go is a good place for it, and why it requires changes to the tracer itself, whereas previously it was able to exist as a stand-alone module.

ddtrace/ddtrace.go Outdated Show resolved Hide resolved
ddtrace/tracer/data_streams.go Outdated Show resolved Hide resolved
internal/statsd.go Show resolved Hide resolved
ddtrace/tracer/tracer.go Outdated Show resolved Hide resolved
ddtrace/tracer/tracer.go Outdated Show resolved Hide resolved
ddtrace/tracer/option.go Outdated Show resolved Hide resolved
datastreams/dsminterface/data_streams_interface.go Outdated Show resolved Hide resolved
@piochelepiotr
Copy link
Collaborator Author

👋 I left an initial review, almost exclusively on how it integrates with the existing tracer. I'll still have to look more closely at the data streams package itself.

I would like to understand better what this is, why dd-trace-go is a good place for it, and why it requires changes to the tracer itself, whereas previously it was able to exist as a stand-alone module.

Thanks for the review!

So the reason to put everything here is that the next step is to use this library in all the integrations.
tracing Kafka for eg would automatically add data streams to Kafka.

The reason it requires to make changes to the tracer, is because I want tracer.Start() to also start the data streams processor.

@ahmed-mez ahmed-mez removed their assignment Aug 8, 2023
@ahmed-mez ahmed-mez removed their request for review August 8, 2023 08:08
Copy link
Contributor

@ajgajg1134 ajgajg1134 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I focused on the changes outside the new datastreams folder. Overall I think this seems like a good path forward to simplifying things for customers!

contrib/Shopify/sarama/data_streams.go Outdated Show resolved Hide resolved
contrib/Shopify/sarama/data_streams.go Outdated Show resolved Hide resolved
ddtrace/tracer/tracer.go Outdated Show resolved Hide resolved
ddtrace/tracer/tracer.go Outdated Show resolved Hide resolved
ddtrace/tracer/mockdatastreams.go Outdated Show resolved Hide resolved
@piochelepiotr piochelepiotr force-pushed the piotr-wolski/add-data-streams-support branch 2 times, most recently from 3e2fccc to bb750a9 Compare August 25, 2023 20:41
Copy link
Contributor

@ajgajg1134 ajgajg1134 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely a bit nicer for customers to reduce the number of interfaces needed here! Just a few smaller change requests / questions :)

contrib/Shopify/sarama/sarama.go Show resolved Hide resolved
contrib/Shopify/sarama/sarama.go Show resolved Hide resolved
contrib/Shopify/sarama/sarama_test.go Outdated Show resolved Hide resolved
Copy link
Contributor

@knusbaum knusbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments to consider, but overall this is a 👍 from me.

contrib/Shopify/sarama/option.go Outdated Show resolved Hide resolved
Comment on lines 20 to 52
// SetDataStreamsCheckpoint sets a consume or produce checkpoint in a Data Streams pathway.
// This enables tracking data flow & end to end latency.
// To learn more about the data streams product, see: https://docs.datadoghq.com/data_streams/go/
func SetDataStreamsCheckpoint(ctx context.Context, edgeTags ...string) (p datastreams.Pathway, outCtx context.Context, ok bool) {
return SetDataStreamsCheckpointWithParams(ctx, datastreams.NewCheckpointParams(), edgeTags...)
}

// SetDataStreamsCheckpointWithParams sets a consume or produce checkpoint in a Data Streams pathway.
// This enables tracking data flow & end to end latency.
// To learn more about the data streams product, see: https://docs.datadoghq.com/data_streams/go/
func SetDataStreamsCheckpointWithParams(ctx context.Context, params datastreams.CheckpointParams, edgeTags ...string) (p datastreams.Pathway, outCtx context.Context, ok bool) {
if t, ok := internal.GetGlobalTracer().(datastreams.ProcessorContainer); ok {
if processor := t.GetDataStreamsProcessor(); processor != nil {
p, outCtx = processor.SetCheckpointWithParams(ctx, params, edgeTags...)
return p, outCtx, true
}
}
return datastreams.Pathway{}, ctx, false
}

// TrackKafkaCommitOffset should be used in the consumer, to track when it acks offset.
// if used together with TrackKafkaProduceOffset it can generate a Kafka lag in seconds metric.
func TrackKafkaCommitOffset(group, topic string, partition int32, offset int64) {
if t, ok := internal.GetGlobalTracer().(datastreams.ProcessorContainer); ok {
if p := t.GetDataStreamsProcessor(); p != nil {
p.TrackKafkaCommitOffset(group, topic, partition, offset)
}
}
}

// TrackKafkaProduceOffset should be used in the producer, to track when it produces a message.
// if used together with TrackKafkaCommitOffset it can generate a Kafka lag in seconds metric.
func TrackKafkaProduceOffset(topic string, partition int32, offset int64) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine to me, but to be clear, this is becoming a public API, even though we are really only using it within the module right now, and we don't really expect users to directly use it.

The consequence of this is that this API cannot change at all once we release it.
If this is not a stable, complete API, and is still being developed at all, it may be better to find some way to hide it so it's not exposed publicly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. given that not everyone uses the sarama or confluent clients, and not everyone stores offsets in Kafka, I think it's important to leave that options to customers.
The main change I see for this API, is an additional argument (Kafka cluster) that we might add in the future.
Maybe I can add it directly, but ignore it for now?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, I don't know what would be the API today. maybe a list of strings, maybe just one string.
So I think what we can do is add another function, but leave this one once we know how we will track clusters.

CODEOWNERS Show resolved Hide resolved
// This product includes software developed at Datadog (https://www.datadoghq.com/).
// Copyright 2016-present Datadog, Inc.

package datastreams
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered putting the public types and functions in a datastreams/internal package instead of exposing everything to users in the current datastreams package ? By having an internal package you'll be able to change it freely in the future without any backward-compatibility restrictions. This also a common practice in dd-trace-go, see ddtrace/internal, profiler/internal and contrib/internal.

datastreams/processor.go Outdated Show resolved Hide resolved
contrib/Shopify/sarama/sarama.go Outdated Show resolved Hide resolved
contrib/Shopify/sarama/sarama.go Show resolved Hide resolved
@piochelepiotr piochelepiotr force-pushed the piotr-wolski/add-data-streams-support branch from 07a9894 to 6b36a9c Compare September 5, 2023 19:35
@ajgajg1134 ajgajg1134 merged commit 3ebb83f into main Sep 6, 2023
50 checks passed
@ajgajg1134 ajgajg1134 deleted the piotr-wolski/add-data-streams-support branch September 6, 2023 13:54
KyleWiering added a commit to KyleWiering/dd-trace-go that referenced this pull request Feb 26, 2024
Pulled the changes from DataDog#2006 into the feature branch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants